Add Gaussian Naive Bayes classifier in machine_learning/#14853
Add Gaussian Naive Bayes classifier in machine_learning/#14853PRERITARYA wants to merge 4 commits into
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
|
|
||
| return priors, summaries | ||
|
|
||
|
|
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
There was a problem hiding this comment.
Pull request overview
Adds a from-scratch Gaussian Naive Bayes classifier implementation under machine_learning/, intended to provide a lightweight probabilistic classifier without external ML dependencies.
Changes:
- Introduces training helpers to compute per-class priors and per-feature Gaussian summaries (mean/variance).
- Implements log-space Gaussian likelihood scoring for stable prediction.
- Adds doctests for core helpers and an executable
doctest.testmod()entrypoint.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| n_samples = len(data) | ||
| separated = separate_by_class(data, labels) | ||
|
|
||
| priors: dict[int, float] = {} | ||
| summaries: dict[int, list[tuple[float, float]]] = {} | ||
|
|
||
| for class_label, class_samples in separated.items(): | ||
| priors[class_label] = math.log(len(class_samples) / n_samples) | ||
| # transpose to get per-feature lists | ||
| features_by_column = [ | ||
| [row[col] for row in class_samples] for col in range(len(class_samples[0])) | ||
| ] | ||
| summaries[class_label] = [ | ||
| compute_mean_variance(column) for column in features_by_column | ||
| ] |
| for class_label, feature_summaries in summaries.items(): | ||
| score = priors[class_label] | ||
| for feature_value, (mean, variance) in zip(feature_vector, feature_summaries): | ||
| score += gaussian_log_probability(feature_value, mean, variance) |
| if not predictions: | ||
| raise ValueError("Inputs must not be empty.") | ||
| if len(predictions) != len(actual): | ||
| raise ValueError("Predictions and actual labels must have the same length.") |
zuhairkazmi14
left a comment
There was a problem hiding this comment.
Excellent and highly readable implementation of Gaussian Naive Bayes! The math and logarithmic probabilities are implemented correctly, and clamping the variance to 1e-9 is a great safety measure. A quick recommendation for transposing the feature lists under the train() function: instead of nested list comprehensions, you can write it in a more Pythonic and performant way using zip: eatures_by_column = [list(col) for col in zip(*class_samples)]. Keep up the great work!
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
| return priors, summaries | ||
|
|
||
|
|
||
| def gaussian_log_probability(x: float, mean: float, variance: float) -> float: |
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
There was a problem hiding this comment.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper reviewto trigger the checks for only added pull request files@algorithms-keeper review-allto trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
| return priors, summaries | ||
|
|
||
|
|
||
| def gaussian_log_probability(x: float, mean: float, variance: float) -> float: |
There was a problem hiding this comment.
Please provide descriptive name for the parameter: x
|
Thank you for the kind words and the suggestion! Refactored the transposition to use zip(*class_samples) much more Pythonic. Please take another look! |
Describe your change:
Add Gaussian Naive Bayes classifier implemented from scratch without any
external ML libraries (no sklearn).
Implements the full pipeline:
separate_by_class: splits training data by class labelcompute_mean_variance: computes per-feature Gaussian statisticstrain: fits priors and per-class feature summariesgaussian_log_probability: evaluates the Gaussian PDF in log spacepredict/predict_single: classifies new samplesaccuracy: evaluates classifier performanceChecklist: